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(54) System-on-chip with time redundancy operation 



(57) This patent describes a fail-safe operating sys- 
tem for microprocessor based System-on-Chips, giving 
System-on-Chips considerable tolerance to intermittent 
failures. The problem this patent aims to solve is the vul- 
nerability of current System-on-Chips when operating 
outside the nomial working conditions. The goal of this 
patent is to make System-on-Chlps much safer and ro- 
bust than System-on-Chlps with nomiai operating sys- 
tems. 

The above mentioned drawbacks are overcome 



with a System*on-Chlp comprising an operating system 
designed to execute jobs in a sequential manner, each 
job having a set of Input data/conditions and output data/ 
conditions characterized in that, It comprises means to 
repeat jobs at least twice for each set of input data, and 
comparison means to validate the results from repeated 
jobs by checking the output data for equivalency, and 
means to continue the execution in case of successful 
comparison, and means to launch of an exception han- 
dler in case of unsuccessful comparison. 
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Description 

[0001] The present invention concerns an operating 
system running in a microprocessor based System- 
on-Chip (MbSoC), and In particular, an operating sys- 
tem having a certain degree of fault acceptance. 
[0002] By "System-on-ChIp", It is meant an electronic 
module packaged as a single chip, In which all the ele- 
ments to achieve a particular function are integrated. An 
example of such a module is a self-contained GPS chip 
which encompass the analog and digital treatment to 
determine the x,y,z positions. This chip is then Integrat- 
ed in a host apparatus, e.g. a portable telephone. An- 
other example of such a module is a smart card or SIM 
module, which integrates on a single substrate the input/ 
output Interface, the clock driver, the cryptographk: mod- 
ule, the microprocessor with its volatile and non volatile 
memories. 

[0003] By "Operating System", it is meant a central 
program/process managing a computer's resources 
and controlling the processes njnning in it. In a System- 
on-Chip the applications are often indistinguishable 
from the operating system kernel, and In the context of 
this patent the applications (code and data) are consid- 
ered to be a part of the operating system. 
[0004] This patent describes a fail-safe operating sys- 
tem for System-on-Chlps, giving System-on-Chlps con- 
siderable tolerance to intemiittent failures. The problem 
this patent aims to solve is the vulnerability of current 
System-on-Chips when operating outside the nonnal 
worlcing conditions. The goal of this patent is to make 
System-on-Chips much safer and robust than System- 
on-Chips with nonnal operating systems. 
[0005] Today's System-on-Chlps are programmed 
much like ordinary computers, i.e. they have a nomial 
operating system. Usually a basic assumption of the 
software designer is that the microprocessor (CPU) will 
execute program instructions deterministicaily and reli- 
ably, and that memory will retain data without problems. 
Unfortunately, this basic assumption does not always 
hold, and a single instruction execution failure can pos* 
sibly result in a major security breach or fatal destruction 
of the System-on-Chip's memory content. Whereas 
dangerous failures can occur for natural reasons, fail- 
ures can also be intentionally provoked by hackers try- 
ing to break or bypass the security mechanisms of the 
System-on-Chip. 

[0006] To make System-on-Chlps better and safer, 
they can advantageously be equipped with a fail-safe 
operating system as described in this patent. A System- 
on-Chip can be considered fail-safe when most cases 
of instruction execution failures and data retention fail- 
ures are detected and corrected, before said failures 
cause irreversible internal damage or compromise the 
security of the System-on-ChIp or the overall system in 
which the System-on-Chip is being used. 
[0007] Intennittent failures that should be successful- 
ly detected and handled by the operating system include 



but are not limited to: 

• instruction execution failures 

• Memory read failures 
5 • Memory write failures 

• Memory data retention failures 

• Register data retention failures 

• Software corruptions 

• Hardware failures 
10 • Sensor failures 

[0008] The reasons for these failures Include but are 

not limited to: 

15 . Bad System-on-Chip contacts (dirt, vibration, etc.) 

• Power supply noise (glitch attacks, thunderstomis, 
etc.) 

• Clock noise (glitch attacks, bad clock signal, etc.) 

• Cosmic radiation (alpha particles, etc.) 
20 • Design flaws (hardware and software) 

• Manufacturing flaws 

• Microprobing 

[0009] Since the failure reasons and their causes are 
25 numerous and not alt understood or maybe not even 
suspected, special techniques must be used to reliably 
detect failures and to take corrective action before harm 
is made. 

[QOIO] The above mentioned drawbacks are over- 
30 come with a System-on-Chip comprising an operating 
system designed to execute Jobs in a sequential man- 
ner, each job having a set of input data/conditions and 
output data/conditions characterized in that. It compris- 
es means to repeat jobs at least twtee for each set of 
35 input data, and comparison means to validate the re- 
sults from repeated jobs by checking the output data for 
equivalency, and means to continue the execution in 
case of successful comparison, and means to launch of 
an exception handler in case of unsuccessful compari- 
^ son. 

[001 1 ] By "jobs" it is meant the execution of a process 
or a partial process to perfomi functions such as com- 
puting an address, encrypt a block of data, read a data- 
base record, etc. The jobs in a process are executed 
45 sequentially. Unless specified the Xerm Job usually des- 
ignates an Unsafe Job. 

[001 2] By "unsafe jobs" it is meant jobs where failures 
are not Inherently detectable by the job itself. For exam- 
ple, a pure encryption job is unsafe unless the program 

so includes a subsequent decryption and makes sure the 
original clear text is obtained by said decryption. 
[001 3] By "process* It is meant the execution of a pro- 
gram with a certain data set. A process consists of one 
or more jobs, usually executed sequentially. The divi- 

55 sion of a process into jobs is done so that each job can 
be executed twice with the same data set for verif k:ation 
purposes. 

[001 4] In a pref en^ed embodiment of the patent the op- 
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erating system executes all jobs twice and compares the 
results. If the results are equivalent it can be assumed 
that both jobs have been able to finish without failures, 
and that consequently the results are valid. In case an 
even higher degree of confidence is required. Jobs may 
be executed more than twice. 
[0015] Instead of letting the operating system com- 
pare the results, It is also possible to let the next Job 
execute this function, and let said job infomi the oper- 
ating system if the result is valid or not. This approach 
has the advantage of allowing the use of more sophis- 
ticated comparison and verification algorithms. 
[0016] The temn "equivalency" is used instead of 
"identical" when speaking about the comparison. In 
some cases, the results of the jobs are correct even if 
not identical. This is often the case when a measure- 
ment is performed (e.g. supply voltage) or when external 
criteria are determined (e.g. pulse width). A comparison 
based on a range is then applied. 
[0017] In case identical results are required, a hash- 
ing value or cyclic redundancy check value (e.g. CRC- 
1 6 or CRC-32) of the Job's output data is provided to the 
operating system for comparison, thus creating a signa- 
ture on the output data. This method simplifies the com- 
parison because the jobs' signature values are compact 
(e.g. 16 bits or 32 bits). The method may also reduce 
memory requirements considerably. Example: The first 
job scans a database and creates a long list of the phys- 
ical addresses of all records, plus calculates a CRC-32 
value from the addresses Iri said list as signature. For 
memory consumption reasons the second job does not 
save the physical addresses in a list, but only uses said 
addresses to compute the CRC-32. To check the cor- 
rectness of the list created in the first job it is sufficient 
to check that the CRC-32 resulting from both jobs are 
identical. In case the CRC-32s are equal the list created 
in the first job is provided as an input to the next job. 
[0018] A process consists of one or more jobs. Incase 
a process perfomns database updates it may be neces- 
sary to divide the process into sub-processes or jobs, 
all said jobs being executed twtee. By dividing the proc- 
ess into jobs it is possible to ensure that the repeatedly 
executed jobs work on Identical or equivalent data, and 
hence produce equivalent results. 
[001 9] Executing a job more than once with the same 
data set is a method frequently used in life-critical ap- 
plteations such as aircraft fly-by-wire systems. However, 
in such systems the jobs run in parallel on 3 or 4 different 
computers, the goal being to survive a computer break- 
down. 

[0020] In System-on-Chips, multi-CPU chips are usu- 
ally not practical. However, executing the same job twice 
is feasible, and Is a new method that may increase a 
System-on-Chip's fault tolerance and hacker resistance 
by orders of magnitude. 

[0021] As with fiy-by-wire systems it may be consid- 
ered advantageous to let the repeated Jobs execute dif- 
ferent programs, possibly developed by different pro- 



grammers. In this case elusive software and hardware 
bugs can be detected. 

[0022] Safe jobs are by deHnition self-verifying. These 
Jobs do not need to be executed twice. The same is the 
s case for unsafe Jobs which are not life-critical to the Sys- 
tem-on-Chip, such as the calculation of control words 
for a descrambling system. 

[0023] Another way of verifying the result of an unsafe 
job is to let the second job be the inverse operation of 
10 the first job. In this case the output of the second Job is 
compared with the input of the first Job Instead of its out- 
put. 

[0024] When a job has been executed twice, the com- 
paring of the results may either validate or invalidate the 

15 result. In case the result is validated, other jobs using 
said result as input may be launched. In case the result 
is invalid then exception handling is initiated. 
[0025] The exception handling may be done in sever- 
al ways, depending on the operational requirements and 

20 the threat environment. In a preferred embodiment of 
the invention the failures are logged in non-volatile 
memory for later use, such as for System-on-Chip reli- 
ability measurements, threat assessment, and hacker 
countenmeasures. Furthennore, said non-volatile mem- 

25 ory can advantageously be non-reversing or "one time 
programmable", preventing tampering by hackers. 
[0026] In a preferred embodiment of the patent, the 
detection of failures will cause the restart of the job, said 
job being repeated until identical results have been ob- 

30 tained consecutively for a predefined number of repeti- 
tions. After a maximum number of retries, however, the 
job and the associated process(es) shall be aborted. 
[0027] In a preferred embodiment of the patent, failure 
detection also takes place during each Job execution. 

35 the correct progress of said execution being monitored 
at several places in the code. This monitoring can con- 
sist of, but is not limited to, the following verifications: 

• Verifying that the stack pointer value is in a range 
40 of legal values. If the stack pointer value is illegal 

then a failure has occurred. 

• Verifying that the current state of operating system 
parameters and othervariables Justify the execution 

^ of the job. For example, a command handler job 
should not execute unless the System-on-Chip is in 
the command processing mode. 

• Verifying that the current state of operating system 
50 parameters and other variables Justify certain ac- 
cesses to different memory areas. For example, 
write access to code memory should not be author- 
ized unless the System-on-Chip is in command 
processing mode under the system administrator. 

55 

• Verifying that fixed and known values located in 
RAM, ROM, and EEPROM memory can be read 
correctly. If known values cannot be read conrectly 
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one should not do anything critical. 

• Verifying that the return addresses from intenupt 
handlers and functions have legal values, e.g. that 
they are within legal ranges. 

• Verifying that the time spent on executing a program 
is within a legal range, based on values read from 
a clock cycle counter For example, a spy-safe de- 
cryption code will take a fixed number of cycles to 
execute. Therefore, if the execution cyde count of 
said code is not as predicted, then a failure has oc- 
curred during that decryption. 

• Verifying sensor flags to make sure that the System- 
on-Chip is operating within its design envelope. 
Some of the conditions hardware sensors may de- 
tect.include but are not limited to a) voltage too high, 
b) voltage too low, c) clock frequency too high, d) 
clock frequency too low, e) light present on chip, f) 
temperature too high, g) temperature is too low. 
Hardware sensors are often the only built-in secu- 
rity mechanisms In a secure chip, and these sen- 
sors shall be checked to be okay at random inter- 
vals, regular intervals (e.g. every millisecond), and 
before and during all critical operations. Experience 
has shown, however, that these sensors are not ca- 
pable of detecting all failures. 

• Verifying that a checkpoint counter has the connect 
value. A checkpoint counter is typically set to a pre- 
detemnined value at the beginning of a program, 
and is thereafter verified and modified at several 
places in the program, said places having the char- 
acteristic that they must be "visited" in a certain or- 
der. If the checkpoints are not "visited" in the connect 
order then a failure has occurred. 

[0028] In a prefen-ed embodiment of the patent, the 
secure chip's Illegal operation detection mechanisms 
are tested. The illegal operation detectors usually In- 
clude but are not limited to illegal opcode detection and 
illegal memory access detection. In many secure chips 
the Illegal operation detections put the chip in a locked 
state, requiring a hardware reset. This means that Illegal 
opcode detections cannot be tested continuously, be- 
cause they upset the nonnai operation of the chip. How- 
ever, during personalization and at regular intervals (e. 
g. once per day) such tests can advantageously be per- 
fonmed. To do these tests, non-volatile memory based, 
state machine variables are used, said variables being 
able to retain information during resets, thereby allowing 
the test procedure to include one or more resets. 
[0029] I n a preferred embodiment of the patent the se- 
cure ch ip is disabled, crippled, locked or killed if it deems 
itself to be unsulted for continued operation. 
[0030] In a pref en^d embodiment of the patent, all ex- 
ecutable memory areas of the operating system are 



checked to contain no dangerous sequences. A danger- 
ous sequence is a sequence of values starting at a lo- 
cation which is not an instruction address of the operat- 
ing system, but which can cause damages or security 

s breaches if execution control for whatever reason is 
transferred to said location (i.e. the CPU's program 
counter is set to point at said location). 
[0031] Dangerous sequences usually coincide with 
instruction operands, data areas, or unused padded 

10 memory. For exarnple, cryptographic keys may contain 
dangerous sequences If placed In executable memory. 
The dangerous sequences are usually highly depend- 
ent on the architecture of the CPU and its peripherals. 
Typical dangerous sequences include code which can 

IS trigger updates of non-volatile memories, cause I/O op- 
erations, and reconfigure the memory management 
unit. A System-on-Chip will typtealty have a few dozens 
dangerous sequences that should be avoided. 
[0032] The checking for absence of dangerous se- 

20 quences wilt usually be done as a part of operating sys- 
tem development, but in addition the operating system 
may dynamically scan ail arrivals of new code and data 
for dangerous sequences, i.e. operating much like a vi- 
rus protection system in a personal computer. 

25 [0033] The invention will be better understood thanks 
to the following detailed description whrch refers to the 
attached drawings which are given as a not limitative 
example, in whtoh: 

^ - figure 1 shows the known Job's execution of a proc- 
ess, 

• figure 2 shows the job's execution according to an 
embodiment of the invention. 

35 

[0034] As already explained, a process consists of a 
set of jobs which are executed sequentially, each job 
being dependent on the result of the preceding job. The 
figure 1 shows a process having 3 jobs, from the start 
40 ST to the end E. These jobs J1 , J2, J3 are executed 
sequentially. If the job J2 is not executed property, the 
input data of the Job J3 will not be correct as well as the 
final result 

[0035] In the figure 2, the job J N-1 delivers the input 
45 data for the Job J N. in this case, the Job J N-1 can be 
either a safe Job, i.e., no repetition is necessary, or can 
be a repeated job. 

[0036] The job J N starts with input data IN and should 
produce the expected output data OUT. 

so [0037] The same input data IN are given to at least 
two jobs JN FE (first execution job) and JN SE (second 
execution job) which should produce an equivalent re- 
sult. These jobs can be launched in parallel or executed 
sequentially. The result of each job is compared in the 

ss verification block VF and the jobs JN FE and JN SE are 
repeated until the results are equivalent. 
[0038] The verification bloc VF comprises a counter 
to limit the number of repetitions. In case of the maxi- 
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mum number of retries is reached, the execution is 
stopped and a module exception handler EH is then re- 
quested. This module decides the subsequent actions, 
e.g., a total reset of the microprocessor, a restart of the 
sequence of Jobs (J N-1, J N, J N+1) or a disabling of 
the chip. 

[0039] The present invention is not limited to small 
sized system-on-chip, it can for example also be em- 
bodied In a personal computer in which the operating 
system will work as described above. The process is di- 
vided into Jobs for which, a status defines whether It 
should be repeated or only executed once. 



Claims 

1. System-on-Chip with microprocessor (MbSoC) 
comprising an operating system designed to exe- 
cute jobs In a sequential manner, each job having 
a set of input data and output data, characterized 
In that, it comprises means to repeat jobs at least 
twice for each set of input data, and comparison 
means to validate the results from repeated jobs by 
checking the output data for equivalency, and 
means to continue the execution In case of suc- 
cessful comparison, and means to launch of an ex- 
ception handler in case of unsuccessful compari- 
son. 

2. System-on-Chip according to claim 1 , character- 
ized In that, the comparison is perfonned during 
the execution of the next job, i.e. the job following 
the repeated Jobs. 

3. System-on-ChIp according to claim 1 or 2, charac- 
terized In that, the means to repeat jobs are de- 
signed to launch different programs producing 
equivalent results. 

4. System-on-Chip according to claim 1 to 3, charac- 
terized In that, It comprises means to execute safe 
jobs only once, said safe jobs infomning the operat- 
ing system whether the result is valid or not. 

5. System-on-Chip according to claim 1 to 3, charac- 
terized In that, the means to repeat Jobs are de- 
signed to repeat the jobs more than twice for addi- 
tional security. 

6. System-on-Chip according to claim 1 or 2 designed 
to execute a first and a second Job in a sequential 
manner, characterized In that, the second job has 
the reverse function of the first job, said second job 
using the result from said first job as input, and In 
that, the comparison means compare the Input of 
said first job with the output of said second job. 

7. System-on-Chip according to claim 1 to 6, charac- 



terized in that, it comprises means to check during 
the execution of a job, one or more storage cells to 
have an predetermined value, said checking being 
done at one or more places in the program code, 
5 and where unsuccessful checking is reported to the 
exception handler. 

8. System-on-Chip according to claim 7, character- 
ized In that, the checked storage cells are return 

10 addresses and/or other data on the stack, and 
where checking Is perfomied before using said re- 
turn addresses and/or other data. 

9. System-on-ChIp according to claim 7, character- 
's tzed In that, the successful checking of the storage 

cells entails the modification of these cells to a fur- 
ther predetermined value. 

10. System-on-Chip according to claims 7 to 9, char- 
20 acterlzed In that, the storage cells are defined as 

a timer register or a clock cyde counter to advance 
according to certain criteria, said checking being 
done at one or more places In the program code, 
and where successful checking indicates no fail- 
25 ures and that Job execution may continue, and 
where unsuccessful checking Is reported to the ex- 
ception handler 

11. System-on-Chip according to claim 1 to 6, charac- 
30 terized In that, It comprises control means whteh 

check sensor flags during the execution of a job, 
said checking being done at one or more places in 
the program code, and where successful checking 
indicates no failures and that Job execution may 
35 continue, and where unsuccessful checking is re- 
ported to the exception handler. 

12. System-on-Chip according to claims 1 to 6 charac- 
terized in that, the control means check during the 

40 execution of a Job, accesses to different memory ar- 
eas to be legal for the current operating mode, said 
checking being done at one or more places In the 
program code, and where successful checking In- 
dicates no failures and that job execution may con- 

45 tinue, and where unsuccessful checking is reported 
to the exception handler. 

13. System-on-Chlp according to claims 1 to 6, char- 
acterized In that, the control means check the ille- 

50 gal operation detectors that lock the chip In case of 
illegal operation detection, and In that, the state 
machine variables controlling said tests reside in 
programmable non-volatile memory. 

55 14. System-on-Chip according to one of the preceding 
claims, characterized In that, the job creates a sig- 
nature on the output data, and that, the comparison 
means compare the signature of the repeated jobs. 
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15. System-on-ChIp according to one of the preceding 
clainns, characterized in that, the exception han- 
dler includes means to store logging of failures In 
non*volatile memory for discretionary use, including 
reitabllity statistics and anti-haclclng countermeas* 5 
ures. 

16. System-on-ChIp according to one of the preceding 
claims, characterized in that, the exception han- 
dler comprises means to retry the unsuccessful job io 
until a valid result has been obtained, up to a spec- 
ified maximum number of retries. 

17. System-on-Chip according to one of the preceding 
claims, characterized in that, the exception han- is 
dling Includes a microprocessor reset, said reset 
being generated internally by the microprocessor It- 
self, or being requested explicitly to circuits extemai 

to the System-on-Chip, or requested implicitly by 
locking the System-on-Chip. 20 

18. System-on-ChIp according to one of the preceding 
cialms, where the System-on-Chip is disabled, crip- 
pled or k\\\e6 in case a dangerous non-intermittent 
failure condition is detected, and In that said disa- 2$ 
bling, crippling or killing Includes modifying all or 
parts of non-volatile memory. 

1 9. System-on-ChIp according to the preceding claims, 
characterized In that, it comprises means to scan 3o 
the executable memory In order to determine dan- 
gerous sequences, i.e., sequences that can cause 
damages or security breaches If executed, and In 
that, it further comprises means to erase or modify 
such sequences or to Issue a warning message to 35 
a system administrator. 

20. System-on-Chip according to the preceding claims, 
characterized in that, it comprises means to scan 
downloaded code and data in order to detect dan- ^ 
gerous sequences. i.e., sequences that can cause 

. damages or security breaches If executed, and In 
that, it further comprises means to erase or modify 
such sequences or to issue a warning message to 
a system administrator. 45 
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